Uczelnia: Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

Wydział: Wydział Geodezji Górniczej i Inżynierii Środowiska

Kierunek: Geoinformacja

Rok: drugi

Semestr: czwarty

Przedmiot: Przetwarzanie Danych Środowiskowych

Prowadzacy: dr inż. Mateusz Rzeszutek

Autorzy:


1. Cel i zakres projektu


Celem projektu jest zapoznanie się z analizą szeregów czasowych, wybranie nalepszego modelu i prognozy w języku R. Dane będą pochodzić z baz GIOS i ISD NOAA.

  • Stacja GIOS: Radom
  • Stacja NOAA: Kozienice (najbliższa stacja dla Radomia)

2. Pozyskanie danych


Pozyskaliśmy dane z bazy ISD NOAA oraz GIOŚ.

kat_dost <- "/Users/hdebowski/Downloads/R/Dane"
#getMeta(site = "", lat = 51.403611, lon = 21.156667,end.year = "current", plot = T, returnMap = T )
# dane <- importNOAA(code = "124880-99999", year = 2000:2019)
#mapa <- getMeta(site= "KOZIENICE", end.year="current", returnMap=TRUE)
# kozienice <- dane
# kozienice$date <- kozienice$date+3600
# kozienice <-kozienice[c(3,7:14)]
# meta <- gios_metadane(type = "stacje", 
#                       download = F,      #T za pierwsztm razem
#                       path = kat_dost, 
#                       mode = "wb")
#gios_vis(data = meta %>% filter(is.na(data.zamkniecia))) # tylko aktywne stacje
# stanowiska <- gios_metadane(type = "stanowiska", 
#                             download = F,  
#                             path = kat_dost, 
#                             mode = "wb")
# 
# statystyki_podstawowe <- gios_metadane(type = "statystyki", 
#                            download = F, 
#                            path = kat_dost, 
#                            mode = "wb")
# pliki_all <- map2(.x = as.list(zrodlo[,1]),
#                   .y = as.list(zrodlo[,2]), 
#                   .f = gios_download, 
#                   path = kat_dost,
#                   mode = "wb")
# wek <- 2000:2020 %>% 
#  as.character() %>% 
#   paste0(kat_dost, "/", .)
# 
# pliki_all <- map(.x = wek, 
#                   .f = dir)
# names(pliki_all) <- paste0("R",zrodlo[,2])
# names(pliki_all)
# kody <- c("MzRadTochter")
# inp_pm10_1 <- map(.x = pliki_all,
#                   .f = ~ .[str_detect(., pattern = "PM10_1g")]) %>% flatten_chr()
# 
# PM10 <- map_df(.x = inp_pm10_1,
#                .f = gios_read,
#                czas_mu = "1g",
#                path = kat_dost)
# 
# PM10 <- gios_kody(data = PM10, meta = meta)
# 
# unique(PM10$kod) %>% .[str_detect(., "MzRad")]
# Radom <- PM10 %>% filter(kod == "MzRadTochter")
# summary(Radom)
#laczenie
# Radom <- left_join(Radom, kozienice, by = "date")
# summary(Radom)
#postanowiliśmy "okroić" naszą zmienną wybierając tylko te zmienne, które realnie przydadzą nam się w naszej analizie, uznaliśy że przyśpieszy to przebieg naszej analizy oraz ułatwi nam pracę.
# Radom2 <- Radom %>%
#   dplyr::select(date, obs, ws, air_temp, dew_point, atmos_pres, RH, visibility)
# summary(Radom2)
load("/Users/hdebowski/czwartek_dane.RData")

2.1 Sprawdzenie kompletności danych


Sprawdzamy kompletności danych. Po wybraniu kolumn które będą nam przydatne za pomocą funkcji vis_miss wizualizujemy braki danych. Brakujące dane są na poziomie 21.5%.

vis_miss(Radom2, warn_large_data = F)

2.1. Uzupełnienie braków danych


Uzupełniamy braki danych korzystając z algorytmu locf.

dane_gotowe <- na.seadec(
  Radom2,
  algorithm = "locf",
  find_frequency = TRUE,
  maxgap = Inf,
)
vis_miss(dane_gotowe, warn_large_data = F)


2.3. Uśrednianie danych


Uśredniamy dane do postaci średniomiesięcznej oraz wybieramy interesujące nas lata do analizy - 2004-2019.

#usrednianie danych
Radom_m <- dane_gotowe %>%  timeAverage(avg.time = "month")
#Radom_m2 <- Radom2 %>%  timeAverage(avg.time = "month")
#Radom_m2<- Radom_m2 %>% selectByDate(year= 2004:2019)
Radom_m <- Radom_m %>% selectByDate(year= 2004:2019)
#Radom_d <- Radom2 %>%  timeAverage(avg.time = "day")
#Radom_w <- Radom2 %>%  timeAverage(avg.time = "week")
#plot(Radom_m)

#summary(Radom_m)

Po wykonaniu uśredniania zgodnie z powyższymi parametrami nadpiszmy nasze zmienne korzystając z biblioteki tsibble.

#Radom_d <- Radom_d %>%
#  mutate(date = ymd(date)) %>%
#  as_tsibble(index = date)

Radom_m <- Radom_m %>%
  mutate(date = yearmonth(date)) %>%
  as_tsibble(index = date)

3. Wizualizacja szeregów czasowych


3.1 Sezonowość i trend


Dla poniższego wykresu dostrzegalna jest sezonowość (występowanie wzrostów i spadków następuje w sposób cykliczny)

Radom_m %>% autoplot() + ggtitle("Stężenie PM10 w latach 2004-2019")+
  ylab("Stężenie PM10")+
  xlab("Rok")

Na poniższym wykresie zauważalny jest spadek stężenia PM10 w okresach zimowych oraz spadki w sezonie letnim. Wyróżniającymi się latami są 2006 oraz … (kolor oliwkowy)(najwyższe wartości w porównaniu z pozostałymi latami)

gg_season(Radom_m,obs, labels = "both")+ ylab("PM10") +
  xlab("miesiąc")+
  ggtitle("Wykres sezonowy: Stezenie sredniomiesieczne PM10 w latach 2004-2019")

Wykres wskazuje że największe stężenie PM10 występuje w lutym, delikatnie ustępuje mu styczeń jednakże jego średnia jest mocno zawyżona przez bardzo wysokie stężenia występujące w początkowej fazie analizowanego szeregu czasowego. Najmniejsze stężenie występuje w czerwcu. W miesiącach: maj-grudzień, luty, kwiecień widać tendencję spadkową. Styczeń oraz luty prezentują dosyć podobny zakres wartości. Poprawa parametru może wynikać z dużej uwagi przywiązywanej temu zagadnieniu. Jest mocno nagłaśniany (m.in. artykuły informujące o przekroczeniach, darmowa komunikacja gdy normy zostaną przekroczone, elektorniczne tablice pokazujące jakość powietrza i wartość stężeń głównie PM10 - taka informacja jest także przekazywana np. w Krakowiskich autobusach)

gg_subseries(Radom_m, obs, period = "year")


3.2 Biały szum


Seria prawodopodobnie nie jest białym szumem, ponieważ występują dane poza zakresem przerywanych linii. (chyba po prostu bo jest autokorelacja.)

ACF(Radom_m, lag_max = 12) %>% autoplot() + labs(title = "Bialy szum")


3.3 Korelacja


Bardzo wysoka korelacja między dew_point i air_temp. air_temp oraz visibility. Usuwamy temperaturę z analizę regresji ponieważ wysoka korelacja między składnikami może zaburzyć poprawność modeli.

Radom_m %>%
  GGally::ggpairs(columns = 2:8)

Radom_m2 <- subset(Radom_m, select=-c(air_temp))

3.4 Podział zbioru na treningowy i testowy


Dane podzielono na zbiór treningowy (2004-2018) oraz testowy (2019)

train <- Radom_m2 %>% filter(year(date)>=2004 & year(date) <= 2017)
test <- Radom_m2 %>% filter(year(date)>=2018 & year(date) <= 2019)

4. Modele


4.1 Regresja liniowa prosta


Pierwszy model bazujemy na najbardziej na najbardziej skorelowanej zmiennej z PM10 czyli dew_point. R^2 wynosi: 0.4933. Wynik możemy interpretować jako że model wyjaśnia 49,33% przypadków czyli całkiem dobrze.

train %>% ggplot(aes(x = obs, y = dew_point)) + 
  geom_point() + 
  geom_smooth(method = "lm") +
  ggtitle("Regresja liniowa PM10~dew_point")

fit <- train %>% 
  model(TSLM(formula = obs~dew_point))

fit %>% report()
## Series: obs 
## Model: TSLM 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.665  -6.444  -1.541   3.889  84.166 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  46.9423     1.1427   41.08   <2e-16 ***
## dew_point    -1.8179     0.1421  -12.79   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.02 on 166 degrees of freedom
## Multiple R-squared: 0.4963,  Adjusted R-squared: 0.4933
## F-statistic: 163.6 on 1 and 166 DF, p-value: < 2.22e-16

4.2 Regresja wieloraka


Następnie budujemy model którego parametrami są wszystkie zmienne. R^2 wynosi: 0.6574. Największy wpływ na model miał ws, dew_point oraz data. Model wyjaśnia 65,86% danych, zauważalny jest spory postęp.

# przyjrzyjmy się 

fit_multi <- train %>% 
  model(TSLM(formula = obs ~ ws  + dew_point + atmos_pres + RH + visibility + date)) 
#unemployment nie jest wazny
fit_multi %>% report()
## Series: obs 
## Model: TSLM 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.807  -4.967  -1.251   3.630  75.134 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.735e+02  2.225e+02  -1.679   0.0952 .  
## ws          -1.332e+01  2.062e+00  -6.460 1.18e-09 ***
## dew_point   -1.982e+00  2.632e-01  -7.530 3.42e-12 ***
## atmos_pres   4.912e-01  2.146e-01   2.289   0.0234 *  
## RH           1.671e-01  1.639e-01   1.020   0.3094    
## visibility  -3.233e-04  2.672e-04  -1.210   0.2280    
## date        -3.093e-03  6.813e-04  -4.539 1.10e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.882 on 161 degrees of freedom
## Multiple R-squared: 0.6697,  Adjusted R-squared: 0.6574
## F-statistic: 54.42 on 6 and 161 DF, p-value: < 2.22e-16

4.3 Porównanie regresji prostej i wielorakiej


4.3.1 Odwzorowanie danych


Scalamy dane z regresji prostej liniowej oraz regresji wielorakiej.

#RESZTY R NORMALNY RESZT, BRAK WZORCOW, BRAK AUTOKORELACJI, SREDNIA = 0
#jak brak autokorelacji reszt to nie mozemy stosowac przedzialow ufnosci, no chyba ze zrobimy bootstrap

# scalamy dane
bind_rows(fit %>% 
            augment() %>% 
            mutate(nazwa = "liniowy"),
          fit_multi %>% 
            augment() %>% 
            mutate(nazwa = "wieloraki")
) -> dopasowanie

Po nałożeniu modeli na dane możemy zaobserwować jak model radzi sobie z odwzorowaniem danych. Faktycznie model regresji lepiej radzi sobie z odwzorowaniem, zwłaszcza wartości odstających.

dopasowanie %>% 
  ggplot(aes(x = date)) +
  geom_line(aes(y = .fitted,     color = "model")) +
  geom_line(aes(y = obs, color = "dane")) +
  facet_wrap(~nazwa, ncol = 1) + 
  labs(color = "Reprezentacja")

Ponownie lepsza jest regresja wielokara (większe zagęszczenie punktów przy prostej).

dopasowanie %>% 
  ggplot(aes(x = obs, y = .fitted, color = nazwa)) +
  geom_point() +
  geom_abline(intercept = 0, slope = 1) +
  facet_wrap(~nazwa, ncol = 1)


4.3.2 Wykres reszt


Zarówno dla prostej jak i wielorakiej nie występuje autokorelacja reszt. Wariancja jest jednorodna.

fit %>% gg_tsresiduals()

fit_multi %>% gg_tsresiduals()


4.3.3 Testy normalności Shapiro-Wilka


W teście Shapiro-Wilka p-value mniejsze niż 0.05

shapiro.test(fit %>% residuals() %>% pull(.resid))
## 
##  Shapiro-Wilk normality test
## 
## data:  fit %>% residuals() %>% pull(.resid)
## W = 0.76288, p-value = 3.418e-15
ggpubr::ggqqplot(fit %>% residuals() %>% pull(.resid))

shapiro.test(fit_multi %>% residuals() %>% pull(.resid))
## 
##  Shapiro-Wilk normality test
## 
## data:  fit_multi %>% residuals() %>% pull(.resid)
## W = 0.7637, p-value = 3.638e-15
ggpubr::ggqqplot(fit_multi %>% residuals() %>% pull(.resid))


4.4 Regresja prosta z trendem i sezonowoscia


Jako predyktora przy regresji używamy wyłącznie trendu i sezonowości. R^2: 0.4662 czyli całkiem nieźle.

fit_Radom <- Radom_m2 %>% 
  model(TSLM(obs ~ trend() + season()))

fit_Radom %>% report() 
## Series: obs 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.4899  -5.3774  -0.9006   3.0073  75.6065 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     60.00693    3.34761  17.925  < 2e-16 ***
## trend()         -0.05404    0.01578  -3.424 0.000764 ***
## season()year2    0.72812    4.27655   0.170 0.864999    
## season()year3   -7.54310    4.27664  -1.764 0.079473 .  
## season()year4  -16.64150    4.27678  -3.891 0.000141 ***
## season()year5  -29.76693    4.27699  -6.960 6.21e-11 ***
## season()year6  -32.51124    4.27725  -7.601 1.59e-12 ***
## season()year7  -31.75911    4.27757  -7.425 4.43e-12 ***
## season()year8  -28.86390    4.27795  -6.747 2.02e-10 ***
## season()year9  -19.57353    4.27838  -4.575 8.87e-06 ***
## season()year10 -16.04158    4.27888  -3.749 0.000239 ***
## season()year11 -12.73619    4.27943  -2.976 0.003323 ** 
## season()year12 -10.36126    4.28004  -2.421 0.016484 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.1 on 179 degrees of freedom
## Multiple R-squared: 0.5018,  Adjusted R-squared: 0.4684
## F-statistic: 15.03 on 12 and 179 DF, p-value: < 2.22e-16

4.5 Model z serią Fouriera


Dla K=2 wartość R^2 wynosi: 0.4775.

fit_k2 <- Radom_m2 %>% 
  model(TSLM(obs ~ trend() + fourier(K=2)))

fit_k2 %>% report()
## Series: obs 
## Model: TSLM 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.655  -5.945  -1.554   3.237  76.769 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         42.97267    1.73600  24.754  < 2e-16 ***
## trend()             -0.05461    0.01561  -3.499 0.000584 ***
## fourier(K = 2)C1_12 15.18269    1.22143  12.430  < 2e-16 ***
## fourier(K = 2)S1_12 -0.20367    1.22272  -0.167 0.867886    
## fourier(K = 2)C2_12  0.03772    1.22143   0.031 0.975400    
## fourier(K = 2)S2_12  4.34692    1.22163   3.558 0.000474 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.97 on 186 degrees of freedom
## Multiple R-squared: 0.4934,  Adjusted R-squared: 0.4797
## F-statistic: 36.23 on 5 and 186 DF, p-value: < 2.22e-16

4.6 Modele podstawowe oraz z transformacjami


Utworzyliśmy wszystkie kombinacje parametrów (wraz z transformacjami: log, log10, potęga) i sprawdziliśmy model dla nich. Najlepszymi kombinacjami sortując po kryterium AIC okazały się modele z transformacją log10, a najlepszym z nich log10(obs) ~ ws + atmos_pres + dew_point + trend() + season(). Bardzo dobre wyniki uzyskały również modele z transformacją log.

list(mod1 = obs ~ dew_point,      
     mod2 = obs ~ atmos_pres,  
     mod3 = obs ~ ws,     
     mod4 = obs ~ visibility,
     mod5 = obs ~ RH,
     mod6 = obs ~ dew_point  +   atmos_pres,  
     mod7 = obs ~ dew_point  +   ws,     
     mod8 = obs ~ dew_point  +   visibility,
     mod9 = obs ~ atmos_pres + ws,     
     mod10 = obs ~ atmos_pres + visibility,
     mod11 = obs ~ RH    + visibility,
     mod12 = obs ~ RH    + atmos_pres,
     mod13 = obs ~ RH    + ws,
     mod14 = obs ~ RH    + dew_point,
     mod15 = obs ~ ws    + visibility,
     mod16 = obs ~ ws     + atmos_pres + visibility,
     mod17 = obs ~ ws     + atmos_pres + dew_point,
     mod18 = obs ~ ws     + atmos_pres + RH,
     mod19 = obs ~ ws     + visibility + dew_point,
     mod20 = obs ~ ws     + visibility  + RH,
     mod21 = obs ~ ws     + dew_point + RH,
     mod22 = obs ~ atmos_pres     + visibility + dew_point,
     mod23 = obs ~ atmos_pres     + visibility + RH,
     mod24 = obs ~ atmos_pres     + dew_point + RH,
     mod25 = obs ~ visibility     + dew_point + RH,
     mod26 = obs ~ ws + atmos_pres + visibility + dew_point,
     mod27 = obs ~ ws + atmos_pres + dew_point + RH,
     mod28 = obs ~ ws + atmos_pres + visibility + RH,
     mod29 = obs ~ ws + dew_point + visibility + RH,
     mod30 = obs ~ dew_point + atmos_pres + RH + visibility,
     mod31 = obs ~ dew_point + atmos_pres + RH + visibility+ws,
     mod32 = obs ~ dew_point + trend() + season(),      
     mod33 = obs ~ atmos_pres  + trend() + season(),  
     mod34 = obs ~ ws + trend() + season(),     
     mod35 = obs ~ visibility + trend() + season(),
     mod36 = obs ~ RH + trend() + season(),
     mod37 = obs ~ dew_point  +   atmos_pres + trend() + season(),  
     mod38 = obs ~ dew_point  +   ws + trend() + season(),     
     mod39 = obs ~ dew_point  +   visibility + trend() + season(),
     mod40 = obs ~ atmos_pres + ws + trend() + season(),     
     mod41 = obs ~ atmos_pres + visibility + trend() + season(),
     mod42 = obs ~ RH    + visibility + trend() + season(),
     mod43 = obs ~ RH    + atmos_pres + trend() + season(),
     mod44 = obs ~ RH    + ws + trend() + season(),
     mod45 = obs ~ RH    + dew_point + trend() + season(),
     mod46 = obs ~ ws    + visibility + trend() + season(),
     mod47 = obs ~ ws     + atmos_pres + visibility + trend() + season(),
     mod48 = obs ~ ws     + atmos_pres + dew_point + trend() + season(),
     mod49 = obs ~ ws     + atmos_pres + RH + trend() + season(),
     mod50 = obs ~ ws     + visibility + dew_point + trend() + season(),
     mod51 = obs ~ ws     + visibility  + RH + trend() + season(),
     mod52 = obs ~ ws     + dew_point + RH + trend() + season(),
     mod53 = obs ~ atmos_pres     + visibility + dew_point + trend() + season(),
     mod54 = obs ~ atmos_pres     + visibility + RH + trend() + season(),
     mod55 = obs ~ atmos_pres     + dew_point + RH + trend() + season(),
     mod56 = obs ~ visibility     + dew_point + RH + trend() + season(),
     mod57 = obs ~ ws + atmos_pres + visibility + dew_point + trend() + season(),
     mod58 = obs ~ ws + atmos_pres + dew_point + RH + trend() + season(),
     mod59 = obs ~ ws + atmos_pres + visibility + RH + trend() + season(),
     mod60 = obs ~ ws + dew_point + visibility + RH + trend() + season(),
     mod61 = obs ~ dew_point + atmos_pres + RH + visibility + trend() + season(),
     mod62 = obs ~ dew_point + atmos_pres + RH + visibility+ws + trend() + season(),
     mod63 = obs^2 ~ dew_point,      
     mod64 = obs^2 ~ atmos_pres,  
     mod65 = obs^2 ~ ws,     
     mod66 = obs^2 ~ visibility,
     mod67 = obs^2 ~ RH,
     mod68 = obs^2 ~ dew_point  +   atmos_pres,  
     mod69 = obs^2 ~ dew_point  +   ws,     
     mod70 = obs^2 ~ dew_point  +   visibility,
     mod71 = obs^2 ~ atmos_pres + ws,     
     mod72 = obs^2 ~ atmos_pres + visibility,
     mod73 = obs^2 ~ RH    + visibility,
     mod74 = obs^2 ~ RH    + atmos_pres,
     mod75 = obs^2 ~ RH    + ws,
     mod76 = obs^2 ~ RH    + dew_point,
     mod78 = obs^2 ~ ws    + visibility,
     mod79 = obs^2 ~ ws     + atmos_pres + visibility,
     mod80 = obs^2 ~ ws     + atmos_pres + dew_point,
     mod81 = obs^2 ~ ws     + atmos_pres + RH,
     mod82 = obs^2 ~ ws     + visibility + dew_point,
     mod83 = obs^2 ~ ws     + visibility  + RH,
     mod84 = obs^2 ~ ws     + dew_point + RH,
     mod85 = obs^2 ~ atmos_pres     + visibility + dew_point,
     mod86 = obs^2 ~ atmos_pres     + visibility + RH,
     mod87 = obs^2 ~ atmos_pres     + dew_point + RH,
     mod88 = obs^2 ~ visibility     + dew_point + RH,
     mod89 = obs^2 ~ ws + atmos_pres + visibility + dew_point,
     mod90 = obs^2 ~ ws + atmos_pres + dew_point + RH,
     mod91 = obs^2 ~ ws + atmos_pres + visibility + RH,
     mod92 = obs^2 ~ ws + dew_point + visibility + RH,
     mod93 = obs^2 ~ dew_point + atmos_pres + RH + visibility,
     mod94 = obs^2 ~ dew_point + atmos_pres + RH + visibility+ws,
     mod95 = obs^2 ~ dew_point + trend() + season(),      
     mod96 = obs^2 ~ atmos_pres  + trend() + season(),  
     mod97 = obs^2 ~ ws + trend() + season(),     
     mod98 = obs^2 ~ visibility + trend() + season(),
     mod99 = obs^2 ~ RH + trend() + season(),
     mod100 = obs^2 ~ dew_point  +   atmos_pres + trend() + season(),  
     mod38_K = obs^2 ~ dew_point  +   ws + trend() + season(),     
     mod39_K = obs^2 ~ dew_point  +   visibility + trend() + season(),
     mod40_K = obs^2 ~ atmos_pres + ws + trend() + season(),     
     mod41_K = obs^2 ~ atmos_pres + visibility + trend() + season(),
     mod42_K = obs^2 ~ RH    + visibility + trend() + season(),
     mod43_K = obs^2 ~ RH    + atmos_pres + trend() + season(),
     mod44_K = obs^2 ~ RH    + ws + trend() + season(),
     mod45_K = obs^2 ~ RH    + dew_point + trend() + season(),
     mod46_K = obs^2 ~ ws    + visibility + trend() + season(),
     mod47_K = obs^2 ~ ws     + atmos_pres + visibility + trend() + season(),
     mod48_K = obs^2 ~ ws     + atmos_pres + dew_point + trend() + season(),
     mod49_K = obs^2 ~ ws     + atmos_pres + RH + trend() + season(),
     mod50_K = obs^2 ~ ws     + visibility + dew_point + trend() + season(),
     mod51_K = obs^2 ~ ws     + visibility  + RH + trend() + season(),
     mod52_K = obs^2 ~ ws     + dew_point + RH + trend() + season(),
     mod53_K = obs^2 ~ atmos_pres     + visibility + dew_point + trend() + season(),
     mod54_K = obs^2 ~ atmos_pres     + visibility + RH + trend() + season(),
     mod55_K = obs^2 ~ atmos_pres     + dew_point + RH + trend() + season(),
     mod56_K = obs^2 ~ visibility     + dew_point + RH + trend() + season(),
     mod57_K = obs^2 ~ ws + atmos_pres + visibility + dew_point + trend() + season(),
     mod58_K = obs^2 ~ ws + atmos_pres + dew_point + RH + trend() + season(),
     mod59_K = obs^2 ~ ws + atmos_pres + visibility + RH + trend() + season(),
     mod60_K = obs^2 ~ ws + dew_point + visibility + RH + trend() + season(),
     mod61_K = obs^2 ~ dew_point + atmos_pres + RH + visibility + trend() + season(),
     mod62_K = obs^2 ~ dew_point + atmos_pres + RH + visibility+ws + trend() + season(),
     mod1_log = log(obs) ~ dew_point,      
     mod2_log =  log(obs)  ~ atmos_pres,  
     mod3_log =  log(obs)  ~ ws,     
     mod4_log =  log(obs)  ~ visibility,
     mod5_log =  log(obs)  ~ RH,
     mod6_log =  log(obs)  ~ dew_point  +   atmos_pres,  
     mod7_log =  log(obs)  ~ dew_point  +   ws,     
     mod8_log =  log(obs)  ~ dew_point  +   visibility,
     mod9_log =  log(obs)  ~ atmos_pres + ws,     
     mod10_log =  log(obs)  ~ atmos_pres + visibility,
     mod11_log =  log(obs)  ~ RH    + visibility,
     mod12_log =  log(obs)  ~ RH    + atmos_pres,
     mod13_log =  log(obs)  ~ RH    + ws,
     mod14_log =  log(obs)  ~ RH    + dew_point,
     mod15_log =  log(obs)  ~ ws    + visibility,
     mod16_log =  log(obs)  ~ ws     + atmos_pres + visibility,
     mod17_log =  log(obs)  ~ ws     + atmos_pres + dew_point,
     mod18_log =  log(obs)  ~ ws     + atmos_pres + RH,
     mod19_log =  log(obs)  ~ ws     + visibility + dew_point,
     mod20_log =  log(obs)  ~ ws     + visibility  + RH,
     mod21_log =  log(obs)  ~ ws     + dew_point + RH,
     mod22_log =  log(obs)  ~ atmos_pres     + visibility + dew_point,
     mod23_log =  log(obs)  ~ atmos_pres     + visibility + RH,
     mod24_log =  log(obs)  ~ atmos_pres     + dew_point + RH,
     mod25_log =  log(obs)  ~ visibility     + dew_point + RH,
     mod26_log =  log(obs)  ~ ws + atmos_pres + visibility + dew_point,
     mod27_log =  log(obs)  ~ ws + atmos_pres + dew_point + RH,
     mod28_log =  log(obs)  ~ ws + atmos_pres + visibility + RH,
     mod29_log =  log(obs)  ~ ws + dew_point + visibility + RH,
     mod30_log =  log(obs)  ~ dew_point + atmos_pres + RH + visibility,
     mod31_log =  log(obs)  ~ dew_point + atmos_pres + RH + visibility+ws,
     mod32_log =  log(obs)  ~ dew_point + trend() + season(),      
     mod33_log =  log(obs)  ~ atmos_pres  + trend() + season(),  
     mod34_log =  log(obs)  ~ ws + trend() + season(),     
     mod35_log =  log(obs)  ~ visibility + trend() + season(),
     mod36_log =  log(obs)  ~ RH + trend() + season(),
     mod37_log =  log(obs)  ~ dew_point  +   atmos_pres + trend() + season(),  
     mod38_log =  log(obs)  ~ dew_point  +   ws + trend() + season(),     
     mod39_log =  log(obs)  ~ dew_point  +   visibility + trend() + season(),
     mod40_log =  log(obs)  ~ atmos_pres + ws + trend() + season(),     
     mod41_log =  log(obs)  ~ atmos_pres + visibility + trend() + season(),
     mod42_log =  log(obs)  ~ RH    + visibility + trend() + season(),
     mod43_log =  log(obs)  ~ RH    + atmos_pres + trend() + season(),
     mod44_log =  log(obs)  ~ RH    + ws + trend() + season(),
     mod45_log =  log(obs)  ~ RH    + dew_point + trend() + season(),
     mod46_log =  log(obs)  ~ ws    + visibility + trend() + season(),
     mod47_log =  log(obs)  ~ ws     + atmos_pres + visibility + trend() + season(),
     mod48_log =  log(obs)  ~ ws     + atmos_pres + dew_point + trend() + season(),
     mod49_log =  log(obs)  ~ ws     + atmos_pres + RH + trend() + season(),
     mod50_log =  log(obs)  ~ ws     + visibility + dew_point + trend() + season(),
     mod51_log =  log(obs)  ~ ws     + visibility  + RH + trend() + season(),
     mod52_log =  log(obs)  ~ ws     + dew_point + RH + trend() + season(),
     mod53_log =  log(obs)  ~ atmos_pres     + visibility + dew_point + trend() + season(),
     mod54_log =  log(obs)  ~ atmos_pres     + visibility + RH + trend() + season(),
     mod55_log =  log(obs)  ~ atmos_pres     + dew_point + RH + trend() + season(),
     mod56_log =  log(obs)  ~ visibility     + dew_point + RH + trend() + season(),
     mod57_log =  log(obs)  ~ ws + atmos_pres + visibility + dew_point + trend() + season(),
     mod58_log =  log(obs)  ~ ws + atmos_pres + dew_point + RH + trend() + season(),
     mod59_log =  log(obs)  ~ ws + atmos_pres + visibility + RH + trend() + season(),
     mod60_log =  log(obs)  ~ ws + dew_point + visibility + RH + trend() + season(),
     mod61_log =  log(obs)  ~ dew_point + atmos_pres + RH + visibility + trend() + season(),
     mod62_log =  log(obs)  ~ dew_point + atmos_pres + RH + visibility+ws + trend() + season(),
     mod1_log10 = log10(obs) ~ dew_point,      
     mod2_log10 =  log10(obs)  ~ atmos_pres,  
     mod3_log10 =  log10(obs)  ~ ws,     
     mod4_log10 =  log10(obs)  ~ visibility,
     mod5_log10 =  log10(obs)  ~ RH,
     mod6_log10 =  log10(obs)  ~ dew_point  +   atmos_pres,  
     mod7_log10 =  log10(obs)  ~ dew_point  +   ws,     
     mod8_log10 =  log10(obs)  ~ dew_point  +   visibility,
     mod9_log10 =  log10(obs)  ~ atmos_pres + ws,     
     mod10_log10 =  log10(obs)  ~ atmos_pres + visibility,
     mod11_log10 =  log10(obs)  ~ RH    + visibility,
     mod12_log10 =  log10(obs)  ~ RH    + atmos_pres,
     mod13_log10 =  log10(obs)  ~ RH    + ws,
     mod14_log10 =  log10(obs)  ~ RH    + dew_point,
     mod15_log10 =  log10(obs)  ~ ws    + visibility,
     mod16_log10 =  log10(obs)  ~ ws     + atmos_pres + visibility,
     mod17_log10 =  log10(obs)  ~ ws     + atmos_pres + dew_point,
     mod18_log10 =  log10(obs)  ~ ws     + atmos_pres + RH,
     mod19_log10 =  log10(obs)  ~ ws     + visibility + dew_point,
     mod20_log10 =  log10(obs)  ~ ws     + visibility  + RH,
     mod21_log10 =  log10(obs)  ~ ws     + dew_point + RH,
     mod22_log10 =  log10(obs)  ~ atmos_pres     + visibility + dew_point,
     mod23_log10 =  log10(obs)  ~ atmos_pres     + visibility + RH,
     mod24_log10 =  log10(obs)  ~ atmos_pres     + dew_point + RH,
     mod25_log10 =  log10(obs)  ~ visibility     + dew_point + RH,
     mod26_log10 =  log10(obs)  ~ ws + atmos_pres + visibility + dew_point,
     mod27_log10 =  log10(obs)  ~ ws + atmos_pres + dew_point + RH,
     mod28_log10 =  log10(obs)  ~ ws + atmos_pres + visibility + RH,
     mod29_log10 =  log10(obs)  ~ ws + dew_point + visibility + RH,
     mod30_log10 =  log10(obs)  ~ dew_point + atmos_pres + RH + visibility,
     mod31_log10 =  log10(obs)  ~ dew_point + atmos_pres + RH + visibility+ws,
     mod32_log10 =  log10(obs)  ~ dew_point + trend() + season(),      
     mod33_log10 =  log10(obs)  ~ atmos_pres  + trend() + season(),  
     mod34_log10 =  log10(obs)  ~ ws + trend() + season(),     
     mod35_log10 =  log10(obs)  ~ visibility + trend() + season(),
     mod36_log10 =  log10(obs)  ~ RH + trend() + season(),
     mod37_log10 =  log10(obs)  ~ dew_point  +   atmos_pres + trend() + season(),  
     mod38_log10 =  log10(obs)  ~ dew_point  +   ws + trend() + season(),     
     mod39_log10 =  log10(obs)  ~ dew_point  +   visibility + trend() + season(),
     mod40_log10 =  log10(obs)  ~ atmos_pres + ws + trend() + season(),     
     mod41_log10 =  log10(obs)  ~ atmos_pres + visibility + trend() + season(),
     mod42_log10 =  log10(obs)  ~ RH    + visibility + trend() + season(),
     mod43_log10 =  log10(obs)  ~ RH    + atmos_pres + trend() + season(),
     mod44_log10 =  log10(obs)  ~ RH    + ws + trend() + season(),
     mod45_log10 =  log10(obs)  ~ RH    + dew_point + trend() + season(),
     mod46_log10 =  log10(obs)  ~ ws    + visibility + trend() + season(),
     mod47_log10 =  log10(obs)  ~ ws     + atmos_pres + visibility + trend() + season(),
     mod48_log10 =  log10(obs)  ~ ws     + atmos_pres + dew_point + trend() + season(),
     mod49_log10 =  log10(obs)  ~ ws     + atmos_pres + RH + trend() + season(),
     mod50_log10 =  log10(obs)  ~ ws     + visibility + dew_point + trend() + season(),
     mod51_log10 =  log10(obs)  ~ ws     + visibility  + RH + trend() + season(),
     mod52_log10 =  log10(obs)  ~ ws     + dew_point + RH + trend() + season(),
     mod53_log10 =  log10(obs)  ~ atmos_pres     + visibility + dew_point + trend() + season(),
     mod54_log10 =  log10(obs)  ~ atmos_pres     + visibility + RH + trend() + season(),
     mod55_log10 =  log10(obs)  ~ atmos_pres     + dew_point + RH + trend() + season(),
     mod56_log10 =  log10(obs)  ~ visibility     + dew_point + RH + trend() + season(),
     mod57_log10 =  log10(obs)  ~ ws + atmos_pres + visibility + dew_point + trend() + season(),
     mod58_log10 =  log10(obs)  ~ ws + atmos_pres + dew_point + RH + trend() + season(),
     mod59_log10 =  log10(obs)  ~ ws + atmos_pres + visibility + RH + trend() + season(),
     mod60_log10 =  log10(obs)  ~ ws + dew_point + visibility + RH + trend() + season(),
     mod61_log10 =  log10(obs)  ~ dew_point + atmos_pres + RH + visibility + trend() + season(),
     mod62_log10 =  log10(obs)  ~ dew_point + atmos_pres + RH + visibility+ws + trend() + season())-> formy

map(formy, as.formula)  -> formy    

out <- map(.x = formy, 
           .f = ~model(train, 
                       TSLM(formula = .x)) %>% 
             glance() 
) %>% 
  do.call(rbind, .) %>%  
  dplyr::select(.model, adj_r_squared, CV, AIC, AICc, BIC)  

out %>% 
  mutate(.model = formy) %>% 
  arrange(AIC) %>% 
  knitr::kable(digits = 2)
.model adj_r_squared CV AIC AICc BIC
log10(obs) ~ ws + atmos_pres + dew_point + trend() + season() 0.77 0.01 -822.58 -818.50 -769.47
log10(obs) ~ ws + atmos_pres + visibility + dew_point + trend() + , season() 0.77 0.01 -822.53 -817.94 -766.30
log10(obs) ~ dew_point + atmos_pres + RH + visibility + ws + , trend() + season() 0.77 0.01 -822.03 -816.89 -762.67
log10(obs) ~ ws + dew_point + visibility + RH + trend() + season() 0.77 0.01 -821.53 -816.94 -765.30
log10(obs) ~ ws + atmos_pres + dew_point + RH + trend() + season() 0.76 0.01 -820.92 -816.33 -764.69
log10(obs) ~ dew_point + ws + trend() + season() 0.76 0.01 -820.35 -816.75 -770.37
log10(obs) ~ ws + dew_point + RH + trend() + season() 0.76 0.01 -820.11 -816.03 -767.00
log10(obs) ~ ws + visibility + dew_point + trend() + season() 0.76 0.01 -819.72 -815.64 -766.61
log10(obs) ~ ws + atmos_pres + visibility + RH + trend() + season() 0.76 0.01 -817.56 -812.97 -761.33
log10(obs) ~ ws + visibility + RH + trend() + season() 0.76 0.01 -816.69 -812.61 -763.59
log10(obs) ~ ws + atmos_pres + visibility + trend() + season() 0.76 0.01 -815.27 -811.19 -762.16
log10(obs) ~ atmos_pres + ws + trend() + season() 0.75 0.01 -814.76 -811.16 -764.77
log10(obs) ~ ws + atmos_pres + RH + trend() + season() 0.75 0.01 -814.61 -810.53 -761.51
log10(obs) ~ RH + ws + trend() + season() 0.75 0.01 -813.25 -809.65 -763.26
log10(obs) ~ ws + trend() + season() 0.74 0.01 -809.86 -806.70 -763.00
log10(obs) ~ ws + visibility + trend() + season() 0.75 0.01 -809.60 -806.00 -759.62
log10(obs) ~ dew_point + atmos_pres + RH + visibility + ws 0.73 0.01 -807.81 -807.11 -785.94
log10(obs) ~ ws + dew_point + visibility + RH 0.73 0.01 -806.31 -805.79 -787.56
log10(obs) ~ ws + atmos_pres + visibility + dew_point 0.72 0.01 -803.57 -803.05 -784.83
log10(obs) ~ ws + visibility + dew_point 0.71 0.01 -800.62 -800.25 -785.00
log10(obs) ~ dew_point + atmos_pres + RH + visibility 0.70 0.01 -794.09 -793.57 -775.34
log10(obs) ~ atmos_pres + visibility + dew_point 0.70 0.01 -790.38 -790.01 -774.76
log10(obs) ~ atmos_pres + visibility + dew_point + trend() + , season() 0.71 0.01 -787.95 -783.87 -734.84
log10(obs) ~ visibility + dew_point + RH 0.69 0.01 -787.64 -787.27 -772.02
log10(obs) ~ dew_point + atmos_pres + RH + visibility + trend() + , season() 0.71 0.01 -786.27 -781.68 -730.04
log10(obs) ~ dew_point + visibility 0.68 0.01 -781.82 -781.57 -769.32
log10(obs) ~ visibility + dew_point + RH + trend() + season() 0.70 0.01 -780.48 -776.40 -727.38
log10(obs) ~ dew_point + visibility + trend() + season() 0.70 0.01 -779.60 -776.00 -729.62
log10(obs) ~ atmos_pres + visibility + RH + trend() + season() 0.69 0.01 -776.76 -772.68 -723.65
log10(obs) ~ ws + atmos_pres + visibility + RH 0.67 0.01 -776.51 -775.99 -757.77
log10(obs) ~ atmos_pres + visibility + RH 0.67 0.01 -776.43 -776.06 -760.81
log10(obs) ~ atmos_pres + visibility + trend() + season() 0.69 0.01 -776.15 -772.55 -726.17
log10(obs) ~ ws + visibility + RH 0.66 0.01 -772.78 -772.41 -757.16
log10(obs) ~ ws + atmos_pres + dew_point + RH 0.66 0.01 -771.46 -770.94 -752.72
log10(obs) ~ RH + visibility 0.66 0.01 -770.63 -770.38 -758.13
log10(obs) ~ atmos_pres + dew_point + RH + trend() + season() 0.68 0.01 -770.24 -766.16 -717.13
log10(obs) ~ dew_point + atmos_pres + trend() + season() 0.68 0.01 -769.81 -766.21 -719.83
log10(obs) ~ RH + visibility + trend() + season() 0.68 0.01 -769.46 -765.86 -719.48
log10(obs) ~ ws + atmos_pres + dew_point 0.65 0.01 -767.57 -767.20 -751.95
log10(obs) ~ ws + dew_point + RH 0.65 0.01 -765.97 -765.60 -750.35
log10(obs) ~ atmos_pres + visibility 0.64 0.01 -765.02 -764.78 -752.53
log10(obs) ~ ws + atmos_pres + visibility 0.64 0.01 -763.82 -763.45 -748.20
log10(obs) ~ dew_point + ws 0.64 0.01 -762.77 -762.53 -750.28
log10(obs) ~ dew_point + trend() + season() 0.66 0.01 -762.49 -759.33 -715.63
log10(obs) ~ visibility + trend() + season() 0.66 0.01 -761.81 -758.65 -714.95
log10(obs) ~ atmos_pres + dew_point + RH 0.64 0.01 -760.74 -760.37 -745.12
log10(obs) ~ RH + dew_point + trend() + season() 0.66 0.01 -760.67 -757.07 -710.69
log10(obs) ~ dew_point + atmos_pres 0.63 0.01 -757.25 -757.01 -744.76
log10(obs) ~ ws + visibility 0.63 0.01 -756.53 -756.28 -744.03
log10(obs) ~ visibility 0.62 0.01 -756.09 -755.94 -746.72
log10(obs) ~ atmos_pres + trend() + season() 0.64 0.01 -753.19 -750.03 -706.33
log10(obs) ~ RH + atmos_pres + trend() + season() 0.64 0.01 -751.59 -747.99 -701.61
log10(obs) ~ RH + dew_point 0.61 0.01 -749.11 -748.87 -736.62
log10(obs) ~ dew_point 0.60 0.01 -746.63 -746.49 -737.26
log10(obs) ~ RH + trend() + season() 0.61 0.01 -738.03 -734.87 -691.17
log10(obs) ~ ws + atmos_pres + RH 0.34 0.02 -658.92 -658.55 -643.30
log10(obs) ~ RH + atmos_pres 0.29 0.02 -649.71 -649.46 -637.21
log10(obs) ~ RH + ws 0.22 0.02 -633.76 -633.51 -621.26
log10(obs) ~ atmos_pres + ws 0.22 0.02 -633.10 -632.85 -620.60
log10(obs) ~ RH 0.20 0.02 -628.74 -628.59 -619.36
log10(obs) ~ atmos_pres 0.12 0.03 -613.61 -613.46 -604.24
log10(obs) ~ ws 0.08 0.03 -606.82 -606.67 -597.45
log(obs) ~ ws + atmos_pres + dew_point + trend() + season() 0.77 0.04 -542.35 -538.27 -489.24
log(obs) ~ ws + atmos_pres + visibility + dew_point + trend() + , season() 0.77 0.04 -542.30 -537.71 -486.07
log(obs) ~ dew_point + atmos_pres + RH + visibility + ws + trend() + , season() 0.77 0.04 -541.79 -536.66 -482.44
log(obs) ~ ws + dew_point + visibility + RH + trend() + season() 0.77 0.04 -541.30 -536.71 -485.07
log(obs) ~ ws + atmos_pres + dew_point + RH + trend() + season() 0.76 0.04 -540.69 -536.10 -484.46
log(obs) ~ dew_point + ws + trend() + season() 0.76 0.04 -540.11 -536.51 -490.13
log(obs) ~ ws + dew_point + RH + trend() + season() 0.76 0.04 -539.88 -535.80 -486.77
log(obs) ~ ws + visibility + dew_point + trend() + season() 0.76 0.04 -539.48 -535.40 -486.37
log(obs) ~ ws + atmos_pres + visibility + RH + trend() + season() 0.76 0.04 -537.32 -532.73 -481.09
log(obs) ~ ws + visibility + RH + trend() + season() 0.76 0.04 -536.46 -532.38 -483.35
log(obs) ~ ws + atmos_pres + visibility + trend() + season() 0.76 0.04 -535.04 -530.96 -481.93
log(obs) ~ atmos_pres + ws + trend() + season() 0.75 0.04 -534.52 -530.92 -484.54
log(obs) ~ ws + atmos_pres + RH + trend() + season() 0.75 0.04 -534.38 -530.30 -481.27
log(obs) ~ RH + ws + trend() + season() 0.75 0.04 -533.01 -529.41 -483.03
log(obs) ~ ws + trend() + season() 0.74 0.04 -529.62 -526.46 -482.76
log(obs) ~ ws + visibility + trend() + season() 0.75 0.04 -529.37 -525.76 -479.38
log(obs) ~ dew_point + atmos_pres + RH + visibility + ws 0.73 0.04 -527.57 -526.87 -505.70
log(obs) ~ ws + dew_point + visibility + RH 0.73 0.04 -526.07 -525.55 -507.33
log(obs) ~ ws + atmos_pres + visibility + dew_point 0.72 0.04 -523.34 -522.81 -504.59
log(obs) ~ ws + visibility + dew_point 0.71 0.04 -520.39 -520.02 -504.77
log(obs) ~ dew_point + atmos_pres + RH + visibility 0.70 0.05 -513.85 -513.33 -495.11
log(obs) ~ atmos_pres + visibility + dew_point 0.70 0.05 -510.14 -509.77 -494.52
log(obs) ~ atmos_pres + visibility + dew_point + trend() + season() 0.71 0.05 -507.72 -503.64 -454.61
log(obs) ~ visibility + dew_point + RH 0.69 0.05 -507.41 -507.04 -491.79
log(obs) ~ dew_point + atmos_pres + RH + visibility + trend() + , season() 0.71 0.05 -506.04 -501.45 -449.81
log(obs) ~ dew_point + visibility 0.68 0.05 -501.58 -501.34 -489.09
log(obs) ~ visibility + dew_point + RH + trend() + season() 0.70 0.05 -500.25 -496.17 -447.14
log(obs) ~ dew_point + visibility + trend() + season() 0.70 0.05 -499.37 -495.77 -449.39
log(obs) ~ atmos_pres + visibility + RH + trend() + season() 0.69 0.05 -496.53 -492.45 -443.42
log(obs) ~ ws + atmos_pres + visibility + RH 0.67 0.05 -496.28 -495.75 -477.53
log(obs) ~ atmos_pres + visibility + RH 0.67 0.05 -496.20 -495.83 -480.58
log(obs) ~ atmos_pres + visibility + trend() + season() 0.69 0.05 -495.92 -492.32 -445.94
log(obs) ~ ws + visibility + RH 0.66 0.05 -492.54 -492.17 -476.92
log(obs) ~ ws + atmos_pres + dew_point + RH 0.66 0.05 -491.23 -490.71 -472.49
log(obs) ~ RH + visibility 0.66 0.05 -490.39 -490.15 -477.90
log(obs) ~ atmos_pres + dew_point + RH + trend() + season() 0.68 0.05 -490.00 -485.92 -436.90
log(obs) ~ dew_point + atmos_pres + trend() + season() 0.68 0.05 -489.57 -485.97 -439.59
log(obs) ~ RH + visibility + trend() + season() 0.68 0.05 -489.23 -485.62 -439.24
log(obs) ~ ws + atmos_pres + dew_point 0.65 0.05 -487.33 -486.96 -471.71
log(obs) ~ ws + dew_point + RH 0.65 0.05 -485.73 -485.36 -470.11
log(obs) ~ atmos_pres + visibility 0.64 0.06 -484.79 -484.54 -472.29
log(obs) ~ ws + atmos_pres + visibility 0.64 0.06 -483.58 -483.21 -467.96
log(obs) ~ dew_point + ws 0.64 0.06 -482.54 -482.29 -470.04
log(obs) ~ dew_point + trend() + season() 0.66 0.06 -482.26 -479.10 -435.40
log(obs) ~ visibility + trend() + season() 0.66 0.06 -481.58 -478.42 -434.72
log(obs) ~ atmos_pres + dew_point + RH 0.64 0.06 -480.50 -480.13 -464.88
log(obs) ~ RH + dew_point + trend() + season() 0.66 0.06 -480.44 -476.83 -430.45
log(obs) ~ dew_point + atmos_pres 0.63 0.06 -477.02 -476.77 -464.52
log(obs) ~ ws + visibility 0.63 0.06 -476.29 -476.05 -463.80
log(obs) ~ visibility 0.62 0.06 -475.86 -475.71 -466.48
log(obs) ~ atmos_pres + trend() + season() 0.64 0.06 -472.96 -469.80 -426.10
log(obs) ~ RH + atmos_pres + trend() + season() 0.64 0.06 -471.36 -467.76 -421.38
log(obs) ~ RH + dew_point 0.61 0.06 -468.88 -468.63 -456.38
log(obs) ~ dew_point 0.60 0.06 -466.40 -466.25 -457.03
log(obs) ~ RH + trend() + season() 0.61 0.07 -457.79 -454.63 -410.93
log(obs) ~ ws + atmos_pres + RH 0.34 0.10 -378.68 -378.31 -363.06
log(obs) ~ RH + atmos_pres 0.29 0.11 -369.47 -369.23 -356.98
log(obs) ~ RH + ws 0.22 0.12 -353.52 -353.28 -341.03
log(obs) ~ atmos_pres + ws 0.22 0.12 -352.86 -352.62 -340.37
log(obs) ~ RH 0.20 0.12 -348.50 -348.35 -339.13
log(obs) ~ atmos_pres 0.12 0.14 -333.37 -333.23 -324.00
log(obs) ~ ws 0.08 0.14 -326.58 -326.44 -317.21
obs ~ ws + atmos_pres + dew_point + RH + trend() + season() 0.67 110.05 783.69 788.28 839.92
obs ~ ws + atmos_pres + dew_point + trend() + season() 0.66 109.12 784.73 788.81 837.84
obs ~ ws + atmos_pres + visibility + dew_point + trend() + season() 0.66 108.80 784.88 789.47 841.11
obs ~ dew_point + atmos_pres + RH + visibility + ws + trend() + , season() 0.66 110.60 785.12 790.26 844.48
obs ~ dew_point + ws + trend() + season() 0.65 110.17 787.54 791.14 837.52
obs ~ ws + visibility + dew_point + trend() + season() 0.65 110.15 788.29 792.37 841.39
obs ~ ws + dew_point + RH + trend() + season() 0.65 111.76 789.00 793.08 842.11
obs ~ ws + dew_point + visibility + RH + trend() + season() 0.65 112.03 790.18 794.77 846.41
obs ~ ws + atmos_pres + visibility + dew_point 0.62 115.28 795.58 796.10 814.32
obs ~ dew_point + atmos_pres + RH + visibility + ws 0.62 116.14 796.76 797.46 818.63
obs ~ ws + atmos_pres + visibility + trend() + season() 0.64 116.65 796.87 800.95 849.97
obs ~ atmos_pres + ws + trend() + season() 0.63 117.49 797.37 800.97 847.35
obs ~ ws + visibility + dew_point 0.61 116.15 798.52 798.89 814.14
obs ~ ws + atmos_pres + visibility + RH + trend() + season() 0.63 119.33 798.86 803.45 855.09
obs ~ ws + atmos_pres + RH + trend() + season() 0.63 119.89 798.98 803.06 852.09
obs ~ ws + dew_point + visibility + RH 0.61 116.37 799.05 799.57 817.80
obs ~ ws + trend() + season() 0.62 119.85 803.51 806.67 850.37
obs ~ ws + visibility + trend() + season() 0.62 119.45 803.84 807.44 853.82
obs ~ ws + visibility + RH + trend() + season() 0.62 120.24 804.44 808.52 857.55
obs ~ RH + ws + trend() + season() 0.62 121.36 805.10 808.70 855.08
obs ~ ws + atmos_pres + dew_point + RH 0.58 124.55 808.98 809.50 827.72
obs ~ dew_point + atmos_pres + RH + visibility + trend() + season() 0.60 130.08 812.20 816.79 868.43
obs ~ ws + atmos_pres + dew_point 0.57 126.34 812.21 812.58 827.83
obs ~ atmos_pres + visibility + dew_point + trend() + season() 0.60 129.11 813.05 817.13 866.16
obs ~ atmos_pres + visibility + dew_point 0.57 128.85 813.43 813.80 829.05
obs ~ ws + dew_point + RH 0.57 127.43 814.12 814.49 829.74
obs ~ dew_point + atmos_pres + RH + visibility 0.57 129.86 814.70 815.22 833.45
obs ~ dew_point + ws 0.56 128.82 816.73 816.98 829.23
obs ~ atmos_pres + dew_point + RH + trend() + season() 0.58 135.52 819.52 823.60 872.62
obs ~ dew_point + visibility + trend() + season() 0.58 134.75 821.77 825.37 871.75
obs ~ dew_point + visibility 0.54 133.94 822.71 822.96 835.21
obs ~ visibility + dew_point + RH 0.55 133.93 822.94 823.31 838.56
obs ~ visibility + dew_point + RH + trend() + season() 0.57 137.03 823.67 827.75 876.78
obs ~ atmos_pres + dew_point + RH 0.54 138.24 825.46 825.83 841.08
obs ~ ws + atmos_pres + visibility + RH 0.54 137.92 825.84 826.36 844.59
obs ~ dew_point + atmos_pres 0.53 139.92 828.17 828.41 840.67
obs ~ atmos_pres + visibility + RH 0.53 140.45 828.20 828.57 843.82
obs ~ dew_point + atmos_pres + trend() + season() 0.56 142.20 828.98 832.59 878.97
obs ~ ws + atmos_pres + visibility 0.53 140.42 829.29 829.66 844.91
obs ~ atmos_pres + visibility + trend() + season() 0.56 142.66 829.72 833.32 879.71
obs ~ atmos_pres + visibility 0.52 141.97 830.41 830.66 842.91
obs ~ ws + visibility + RH 0.53 139.06 830.43 830.80 846.05
obs ~ atmos_pres + visibility + RH + trend() + season() 0.55 145.65 831.65 835.73 884.76
obs ~ RH + dew_point + trend() + season() 0.54 146.35 834.60 838.21 884.59
obs ~ RH + visibility 0.51 143.83 835.83 836.07 848.32
obs ~ ws + visibility 0.51 143.62 836.37 836.61 848.86
obs ~ dew_point + trend() + season() 0.53 147.61 836.76 839.92 883.62
obs ~ RH + dew_point 0.50 146.56 837.65 837.89 850.14
obs ~ dew_point 0.49 147.61 839.43 839.58 848.80
obs ~ visibility 0.49 147.32 840.22 840.36 849.59
obs ~ RH + visibility + trend() + season() 0.51 152.53 844.95 848.55 894.93
obs ~ visibility + trend() + season() 0.51 152.36 845.24 848.40 892.10
obs ~ RH + atmos_pres + trend() + season() 0.50 160.45 848.06 851.66 898.04
obs ~ atmos_pres + trend() + season() 0.49 161.63 850.70 853.86 897.56
obs ~ RH + trend() + season() 0.44 173.89 867.75 870.91 914.61
obs ~ ws + atmos_pres + RH 0.29 211.82 898.92 899.29 914.54
obs ~ RH + atmos_pres 0.27 214.75 900.65 900.89 913.15
obs ~ atmos_pres + ws 0.17 243.21 922.32 922.56 934.81
obs ~ RH 0.17 239.55 922.38 922.52 931.75
obs ~ RH + ws 0.17 239.02 922.58 922.83 935.08
obs ~ atmos_pres 0.13 257.14 931.07 931.22 940.44
obs ~ ws 0.04 276.44 947.31 947.46 956.68
obs^2 ~ ws + atmos_pres + dew_point + RH + trend() + season() 0.49 2392560.97 2452.42 2457.01 2508.65
obs^2 ~ dew_point + atmos_pres + RH + visibility + ws + trend() + , season() 0.48 2408392.05 2454.40 2459.53 2513.75
obs^2 ~ ws + atmos_pres + dew_point + trend() + season() 0.46 2433782.28 2459.12 2463.20 2512.22
obs^2 ~ ws + atmos_pres + visibility + dew_point + trend() + , season() 0.46 2435375.87 2460.16 2464.75 2516.39
obs^2 ~ ws + dew_point + RH + trend() + season() 0.46 2429097.90 2461.62 2465.70 2514.73
obs^2 ~ dew_point + ws + trend() + season() 0.45 2429890.08 2462.65 2466.26 2512.64
obs^2 ~ ws + dew_point + visibility + RH + trend() + season() 0.45 2441721.09 2463.62 2468.21 2519.85
obs^2 ~ ws + visibility + dew_point + trend() + season() 0.45 2435669.59 2464.12 2468.20 2517.23
obs^2 ~ ws + atmos_pres + visibility + dew_point 0.41 2474090.34 2465.32 2465.84 2484.06
obs^2 ~ dew_point + atmos_pres + RH + visibility + ws 0.40 2493988.79 2467.25 2467.95 2489.12
obs^2 ~ ws + atmos_pres + dew_point + RH 0.40 2501729.88 2467.41 2467.93 2486.15
obs^2 ~ dew_point + atmos_pres + RH + visibility + trend() + , season() 0.44 2603047.74 2467.50 2472.09 2523.73
obs^2 ~ atmos_pres + dew_point + RH + trend() + season() 0.43 2612501.40 2467.88 2471.96 2520.99
obs^2 ~ ws + atmos_pres + dew_point 0.39 2512154.08 2468.42 2468.79 2484.04
obs^2 ~ ws + visibility + dew_point 0.39 2457053.03 2468.74 2469.12 2484.36
obs^2 ~ ws + atmos_pres + RH + trend() + season() 0.43 2594236.62 2469.23 2473.31 2522.34
obs^2 ~ atmos_pres + ws + trend() + season() 0.42 2578338.23 2470.36 2473.97 2520.35
obs^2 ~ ws + dew_point + visibility + RH 0.39 2472245.17 2470.74 2471.26 2489.48
obs^2 ~ ws + atmos_pres + visibility + RH + trend() + season() 0.43 2602170.85 2470.83 2475.42 2527.06
obs^2 ~ ws + atmos_pres + visibility + trend() + season() 0.42 2570907.10 2470.91 2474.99 2524.02
obs^2 ~ ws + dew_point + RH 0.38 2511200.00 2472.29 2472.66 2487.91
obs^2 ~ dew_point + ws 0.37 2516630.35 2472.88 2473.12 2485.37
obs^2 ~ atmos_pres + visibility + dew_point + trend() + season() 0.41 2656249.06 2474.56 2478.64 2527.67
obs^2 ~ ws + trend() + season() 0.40 2586244.23 2477.27 2480.42 2524.13
obs^2 ~ atmos_pres + visibility + dew_point 0.36 2669435.51 2477.65 2478.02 2493.27
obs^2 ~ ws + visibility + trend() + season() 0.39 2584816.57 2478.43 2482.03 2528.41
obs^2 ~ RH + ws + trend() + season() 0.39 2617319.62 2479.17 2482.77 2529.15
obs^2 ~ dew_point + atmos_pres + RH + visibility 0.36 2691273.12 2479.59 2480.11 2498.33
obs^2 ~ atmos_pres + dew_point + RH 0.35 2698005.86 2479.60 2479.97 2495.22
obs^2 ~ dew_point + atmos_pres 0.34 2707205.06 2480.38 2480.63 2492.88
obs^2 ~ ws + visibility + RH + trend() + season() 0.39 2615643.56 2480.43 2484.51 2533.54
obs^2 ~ visibility + dew_point + RH + trend() + season() 0.38 2720661.81 2482.57 2486.65 2535.68
obs^2 ~ dew_point + visibility + trend() + season() 0.38 2710909.47 2482.84 2486.44 2532.82
obs^2 ~ dew_point + atmos_pres + trend() + season() 0.38 2794948.80 2482.91 2486.51 2532.89
obs^2 ~ ws + atmos_pres + visibility 0.34 2722084.94 2483.28 2483.65 2498.90
obs^2 ~ ws + atmos_pres + visibility + RH 0.34 2734097.69 2484.40 2484.92 2503.14
obs^2 ~ atmos_pres + visibility 0.33 2763319.67 2484.86 2485.11 2497.36
obs^2 ~ RH + dew_point + trend() + season() 0.37 2779800.59 2485.43 2489.03 2535.41
obs^2 ~ atmos_pres + visibility + RH 0.32 2783587.02 2486.42 2486.79 2502.04
obs^2 ~ dew_point + visibility 0.32 2709209.25 2486.74 2486.98 2499.23
obs^2 ~ visibility + dew_point + RH 0.32 2724697.97 2488.66 2489.03 2504.28
obs^2 ~ atmos_pres + visibility + RH + trend() + season() 0.36 2895426.61 2489.03 2493.11 2542.13
obs^2 ~ atmos_pres + visibility + trend() + season() 0.35 2866756.51 2489.30 2492.91 2539.29
obs^2 ~ ws + visibility + RH 0.31 2717515.96 2490.01 2490.38 2505.63
obs^2 ~ ws + visibility 0.31 2723288.93 2490.16 2490.41 2502.66
obs^2 ~ dew_point + trend() + season() 0.35 2847091.68 2490.67 2493.83 2537.53
obs^2 ~ RH + dew_point 0.30 2777536.37 2490.70 2490.94 2503.19
obs^2 ~ dew_point 0.30 2779114.53 2490.86 2491.01 2500.23
obs^2 ~ visibility 0.28 2800157.59 2494.60 2494.74 2503.97
obs^2 ~ RH + visibility 0.28 2804617.44 2495.16 2495.40 2507.65
obs^2 ~ RH + atmos_pres + trend() + season() 0.33 3016097.26 2495.49 2499.09 2545.47
obs^2 ~ atmos_pres + trend() + season() 0.30 3084874.67 2501.85 2505.01 2548.71
obs^2 ~ visibility + trend() + season() 0.29 2961805.49 2504.01 2507.17 2550.87
obs^2 ~ RH + visibility + trend() + season() 0.29 2994589.85 2505.86 2509.46 2555.84
obs^2 ~ RH + atmos_pres 0.21 3232600.63 2511.81 2512.05 2524.30
obs^2 ~ ws + atmos_pres + RH 0.21 3245335.16 2513.49 2513.86 2529.11
obs^2 ~ RH + trend() + season() 0.23 3201193.91 2517.67 2520.83 2564.53
obs^2 ~ atmos_pres + ws 0.13 3539680.30 2528.26 2528.51 2540.76
obs^2 ~ atmos_pres 0.12 3585321.46 2529.32 2529.47 2538.69
obs^2 ~ RH 0.11 3453464.19 2531.19 2531.33 2540.56
obs^2 ~ RH + ws 0.10 3475305.85 2533.18 2533.42 2545.67
obs^2 ~ ws 0.00 3826031.53 2549.70 2549.85 2559.08

Wybieram z nich modele do dalszego porównania:

  • log10(obs) ~ ws + atmos_pres + dew_point + trend() + season()))

  • log10(obs) ~ dew_point + ws + trend() + season()))

  • log10(obs) ~ ws + visibility + RH + trend() + season()))

  • log(obs) ~ ws + atmos_pres + dew_point + trend() + season()))

fit_m1 <- train %>% 
  model(TSLM(formula = log10(obs) ~ ws + atmos_pres + dew_point + trend() + season())) 

fit_m2 <- train %>% 
  model(TSLM(formula = log10(obs) ~ dew_point + ws + trend() + season())) 

fit_m3 <- train %>% 
  model(TSLM(formula = log10(obs) ~ ws + visibility + RH + trend() + season())) 

fit_m4 <- train %>% 
  model(TSLM(formula = log(obs) ~ ws + atmos_pres + dew_point + trend() + season())) 

4.6 Model ETS


Model ETS to model tworzony przy użyciu metody wygładzenia wykładniczego, gdzie prognozy są ważonymi średnimi z poprzednich obserwacji, a wagi maleją wykładniczo z wiekiem obserwacji. Model ten ma minimalizowac AIC. AIC: 1663.651

fit_ets <- train %>%
  model(ETS(obs))

fit_ets %>% report()
## Series: obs 
## Model: ETS(M,N,M) 
##   Smoothing parameters:
##     alpha = 0.05456618 
##     gamma = 0.0001034682 
## 
##   Initial states:
##      l[0]   s[0]    s[-1]     s[-2]    s[-3]     s[-4]     s[-5]     s[-6]
##  41.09406 1.2268 1.072638 0.9961869 1.121667 0.6531318 0.5883658 0.5550555
##     s[-7]   s[-8]    s[-9]  s[-10]   s[-11]
##  0.626941 0.95563 1.193036 1.53211 1.478437
## 
##   sigma^2:  0.0776
## 
##      AIC     AICc      BIC 
## 1663.651 1666.809 1710.510

4.7 Model ARIMA


Model Arima ma na celu opisanie autokorelacji danych. AIC w tym przypadku wyniosło 1391.58.

fit_arima <- train %>%
  model(ARIMA(obs))

fit_arima %>% report()
## Series: obs 
## Model: ARIMA(0,0,2)(0,0,2)[12] w/ mean 
## 
## Coefficients:
##          ma1     ma2    sma1    sma2  constant
##       0.3287  0.1761  0.1491  0.1331   38.2222
## s.e.  0.0805  0.0734  0.0823  0.0749    2.1289
## 
## sigma^2 estimated as 221.3:  log likelihood=-689.79
## AIC=1391.58   AICc=1392.1   BIC=1410.32

4.8 Model regresji dynamicznej


AIC w tym przypadku wyniosło 1298.81.

fit_dyn<- train %>%
  model(ARIMA(obs ~ dew_point + I(dew_point^2)))

fit_dyn %>% report()
## Series: obs 
## Model: LM w/ ARIMA(1,0,0) errors 
## 
## Coefficients:
##          ar1  dew_point  I(dew_point^2)  intercept
##       0.2257    -2.5128          0.0832    44.7907
## s.e.  0.0759     0.2320          0.0217     1.4745
## 
## sigma^2 estimated as 128.7:  log likelihood=-644.4
## AIC=1298.81   AICc=1299.18   BIC=1314.43

4.9 Dynamiczna regresja harmoniczna


Model przetestowano dla K ={1,..,6}. Najlepsze okazały się K=5, K=6.

fit_harm <- train %>%
  model(
    `K = 1` = ARIMA(log(obs) ~ fourier(K = 1) + PDQ(0,0,0)),
    `K = 2` = ARIMA(log(obs) ~ fourier(K = 2) + PDQ(0,0,0)),
    `K = 3` = ARIMA(log(obs) ~ fourier(K = 3) + PDQ(0,0,0)),
    `K = 4` = ARIMA(log(obs) ~ fourier(K = 4) + PDQ(0,0,0)),
    `K = 5` = ARIMA(log(obs) ~ fourier(K = 5) + PDQ(0,0,0)),
    `K = 6` = ARIMA(log(obs) ~ fourier(K = 6) + PDQ(0,0,0))
  )

fit_harm %>%
  forecast(h = "1 year") %>%
  autoplot(train, size = 1, level = 80) +
  facet_wrap(vars(.model), ncol = 2) +
  guides(colour = FALSE) +
  geom_label(
    aes(x = yearmonth("2019 Jan"), y = 100, label = paste0("AICc = ", format(AICc))),
    data = glance(fit_ar_fiur)
  )

fit_harm %>% accuracy()
## # A tibble: 6 × 10
##   .model .type       ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 K = 1  Training 0.623  12.8  7.74 -4.98  18.9 0.702 0.676 -0.0277
## 2 K = 2  Training 0.766  12.4  7.23 -4.06  17.5 0.656 0.653  0.0540
## 3 K = 3  Training 0.766  12.4  7.22 -4.06  17.4 0.655 0.653  0.0540
## 4 K = 4  Training 0.774  12.4  7.18 -4.02  17.3 0.651 0.652  0.0552
## 5 K = 5  Training 0.772  12.3  7.19 -4.02  17.4 0.652 0.650  0.0598
## 6 K = 6  Training 0.770  12.3  7.19 -4.02  17.4 0.652 0.650  0.0591

5. Ocena modeli


Najlepsze dopasowanie wykresu funkcji do danych niż w przypadku innych modeli jest przy regresji wielorakiej oraz modelach m1,m2,m3,m4.

bind_rows(fit %>%
            augment() %>%
            mutate(nazwa = "Liniowa"),
          fit_multi %>%
            augment() %>%
            mutate(nazwa = "Wieloraka"),
          fit_Radom %>%
            augment() %>%
            mutate(nazwa = "Season + trend"),
          fit_k2 %>%
            augment() %>%
            mutate(nazwa = "Fourier k2"),
          fit_arima %>%
            augment() %>%
            mutate(nazwa = "Arima"),
          fit_ets %>%
            augment() %>%
            mutate(nazwa = "ETS"),
          fit_dyn %>%
            augment() %>%
            mutate(nazwa = "regresja dynamiczna"),
          fit_m1 %>%
            augment() %>%
            mutate(nazwa = "log10(obs) ~ ws + atmos_pres + dew_point + trend() + season()"),
          fit_m2 %>%
            augment() %>%
            mutate(nazwa = "log10(obs) ~ dew_point + ws + trend() + season()"),
          fit_m3 %>%
            augment() %>%
            mutate(nazwa = "log10(obs) ~ ws + visibility + RH + trend() + season()"),
          fit_m4 %>%
            augment() %>%
            mutate(nazwa = "log(obs) ~ ws + atmos_pres + dew_point + trend() + season()"),
          
) -> dopasowanie

dopasowanie %>%
  ggplot(aes(x = date)) +
  geom_line(aes(y = .fitted,     color = "model")) +
  geom_line(aes(y = obs, color = "dane")) +
  facet_wrap(~nazwa, ncol = 1) +
  labs(color = "Reprezentacja")

Modele regresji wielorakiej oraz modelach m1,m2,m3,m4 są najlepiej wpasowanie do linii.

dopasowanie %>%
  ggplot(aes(x = obs, y = .fitted, color = nazwa)) +
  geom_point() +
  geom_abline(intercept = 0, slope = 1) +
  facet_wrap(~nazwa, ncol = 1)

Najwyższą korelacje do danych PM10 wykazuje model 1 oraz 4 czyli: M1 - log10(obs) ~ ws + atmos_pres + dew_point + trend() + season(), M4 - log(obs) ~ ws + atmos_pres + dew_point + trend() + season()

dopasowanie %>% 
  as.data.frame() %>% 
  group_by(nazwa) %>% 
  summarise(korelacja = cor(obs, .fitted), 
            res_mean = mean(.resid),
            res_med = median(.resid))%>%
  arrange(korelacja) %>% 
  knitr::kable(digits = 2)
korelacja res_mean res_med
0.75 0.16 -0.91

##Test normalności modeli Wszystkie modele posiadają rozkłąd normalny oprócz modeli: m1, m2, m3, m4

# f1=fit%>% residuals() %>% pull(.resid) %>%shapiro.test();f1
# f2=fit_multi %>% residuals() %>% pull(.resid) %>%shapiro.test();f2
# f3=fit_Radom %>% residuals() %>% pull(.resid) %>%shapiro.test();f3
# f4=fit_k2 %>% residuals() %>% pull(.resid) %>%shapiro.test();f4
# f5=fit_arima %>% residuals() %>% pull(.resid) %>%shapiro.test();f5
# f6=fit_ets %>% residuals() %>% pull(.resid) %>%shapiro.test();f6
# f7=fit_dyn %>% residuals() %>% pull(.resid) %>%shapiro.test();f7
# f8=fit_m1 %>% residuals() %>% pull(.resid) %>%shapiro.test();f8
# f9=fit_m2 %>% residuals() %>% pull(.resid) %>%shapiro.test();f9
# f10=fit_m3 %>% residuals() %>% pull(.resid) %>%shapiro.test();f10
# f11=fit_m4 %>% residuals() %>% pull(.resid) %>%shapiro.test();f11

fit_multi %>% residuals() %>% pull(.resid) %>%shapiro.test()
## 
##  Shapiro-Wilk normality test
## 
## data:  .
## W = 0.7637, p-value = 3.638e-15
normalnosc <- data.frame(model = c("Regresja liniowa prosta", "Regresja wieloraka", "Season + trend", "Fourier k2", "Arima", "ETS", "Regresja dynamiczna", "m1", "m2", "m3", "m4"), pvalue=c(9.637e-09, 2.638e-05, 9.316e-12, 6.815e-12, 2.256e-11, 1.737e-05, 2.348e-07, 0.7278, .74, 0.1148, 0.7278))
datatable(normalnosc)
t1 <- fit%>% 
  augment() %>% 
  features(.resid, ljung_box, lag = 20, dof = 2)

t2 <- fit_multi %>% 
  augment() %>% 
  features(.resid, ljung_box, lag = 20, dof = 2)

t3 <- fit_Radom %>% 
  augment() %>% 
  features(.resid, ljung_box, lag = 20, dof = 2)

t4 <- fit_k2 %>% 
  augment() %>% 
  features(.resid, ljung_box, lag = 20, dof = 2)

t5 <- fit_ets %>% 
  augment() %>% 
  features(.resid, ljung_box, lag = 20, dof = 2)

t6 <- fit_arima %>% 
  augment() %>% 
  features(.resid, ljung_box, lag = 20, dof = 2)

t7 <- fit_dyn %>% 
  augment() %>% 
  features(.resid, ljung_box, lag = 20, dof = 2)


t8 <- fit_m1 %>% 
  augment() %>% 
  features(.resid, ljung_box, lag = 20, dof = 2)

t9 <- fit_m2 %>% 
  augment() %>% 
  features(.resid, ljung_box, lag = 20, dof = 2)

t10 <- fit_m3 %>% 
  augment() %>% 
  features(.resid, ljung_box, lag = 20, dof = 2)

t11 <- fit_m4 %>% 
  augment() %>% 
  features(.resid, ljung_box, lag = 20, dof = 2)


t <- data.frame(rbind(t1,t2,t3,t4, t5, t6, t7, t8, t9, t10, t11))
datatable(t)
#ej mordo tutaj chyba odrzucamy hipoteze zerowa, czyli reszty sa ze soba skorelowane

# Odrzucamy hipotezę alternatywną o autokorelacji reszt. Podrzymujemy hipotezę
# zerową, że reszty nie są z sobą skorelowane (pvalu > 0.05)

Wyjwyższe R^2 posiadał model wieloraki.

porownanie <- map(.x = list(fit, fit_multi, fit_Radom, fit_k2),
    .f = ~glance(.x) %>%
      dplyr::select(adj_r_squared, CV, AIC, AICc, BIC)
    ) %>%
  do.call(rbind, .)

porownanie <- cbind(c("Liniowa","Wieloraka", "trend + season", "Fourier K=2"), porownanie)
names(porownanie)[1] <- "Model"
datatable(porownanie)
test_1 <- fit %>% forecast(new_data = test)
test_2 <- fit_multi %>% forecast(new_data = test)
test_3 <- fit_k2 %>% forecast(new_data = test)
test_4 <- fit_arima %>% forecast(new_data = test)
test_5 <- fit_ets %>% forecast(new_data = test)
test_6 <- fit_dyn %>% forecast(new_data = test)


stats <- bind_rows(
  fit %>% accuracy(),
  fit_multi %>% accuracy(),
  fit_k2 %>% accuracy(),
  fit_arima %>% accuracy(),
  fit_ets %>% accuracy(),
  fit_dyn %>% accuracy(),
  test_1 %>% accuracy(test),
  test_2 %>% accuracy(test),
  test_3 %>% accuracy(test),
  test_4 %>% accuracy(test),
  test_5 %>% accuracy(test),
  test_6 %>% accuracy(test)
)

stats <- stats %>% mutate_at(vars(RMSE, MAE, MPE, MAPE, MASE, RMSSE, ACF1), funs(round(., 3)))

datatable(stats)

Wybraliśmy najlepsze modele do dalszej analizy modele: regresji wielorakiej, regresji dynamicznej, regresji liniowej

Posiadały one najdokładniejsze dopasowania do rzeczywistego stężenia PM10 oraz najlepsze wyniki w testach. Dodatkowo spełniały warunek normalności.

Analizujemy również reszty tych modeli. W wybranych modelach nie wystepuje autokorelacja reszt. Wariacja bliska zeru.

fit%>% 
  gg_tsresiduals()

fit_multi%>% 
  gg_tsresiduals()

fit_dyn%>% 
  gg_tsresiduals()


6. Prognozy


Najlepszą okazała się prognoza modelu regresji wielorakiej.

prog1<-  fit_multi %>% 
  forecast(test, h = "6 month") %>% 
  autoplot(Radom_m) + 
  labs(title = "Wieloraka", y = "PM10")

prog2<-fit_dyn %>% 
  forecast(test, h = "6 month") %>% 
  autoplot(Radom_m) + 
  labs(title = "Dynamiczna", y = "PM10")

prog3<-fit %>% 
  forecast(test, h = "6 month") %>% 
  autoplot(Radom_m) + 
  labs(title = "Liniowa prosta", y = "PM10")


grid.arrange(prog1, prog2, prog3, ncol = 1)


6. Prognozny podstawowymi metodami


Metoda średniej

# Poniżej przedstawimy krótką prezentację 4 podstawowych i bardzo prostych
# metody prognozowania, które bywaja często skuteczne i wystarczające.

# **Metoda średniej** - prognozy przyszłych wartości są równe wartości średniej
# z danych historycznych

m1 <- train %>% model(MEAN(obs))

Metoda Naiwne (prosta)

# **Metoda Naiwne (prosta)** - 'Naiwne' ponieważ zakłądają, że czynniki określające
# zmienną prognozowaną są stałe (niezmienne). Prognozą jest wartość ostatniej
# obserwacji. Tak zwane prognozy losowego marszu.

m2 <- train %>% model(NAIVE(obs))

Metoda Naiwne (sezoowa)

# **Metoda naiwna (sozenowa)** - prognoza jest równan ostatniej obserwacji z
# każdego sezonu (pory roku, miesiaca, tygodnia itd..)

m3 <- train %>% model(SNAIVE(obs ~ lag("year")))

Metoda dryfu (naiwna)

# **Metaoda dryfu (naiwna)** -  prognozy mogą rosnąć lub maleć z czasem, przy
# czym zmianan ta jest średnią zmianą danych historycznych.

m4 <- train %>% model(RW(obs ~ drift()))

Najlepszą z prognoz okazała się SNAIVE.

# wyniki prognoz

gridExtra::grid.arrange(m1 %>% forecast() %>% autoplot(train) + ggtitle("MEAN"),
                        m2 %>% forecast() %>% autoplot(train) + ggtitle("NAIVE"),
                        m3 %>% forecast() %>% autoplot(train) + ggtitle("SNAIVE"),
                        m4 %>% forecast() %>% autoplot(train) + ggtitle("RW"))

# wyniki prognoz

gridExtra::grid.arrange(m1 %>% forecast() %>% autoplot(Radom_m) + ggtitle("MEAN"),
                        m2 %>% forecast() %>% autoplot(Radom_m) + ggtitle("NAIVE"),
                        m3 %>% forecast() %>% autoplot(Radom_m) + ggtitle("SNAIVE"),
                        m4 %>% forecast() %>% autoplot(Radom_m) + ggtitle("RW"))


6. Podsumowanie


Najlepszym modelem okazał się model regresji wielorakiej, przez całą analizę posiadał bardzo dobre parametry oraz posiadał najlepszą prognozę. Zadowalające wyniki osiągnęły również modele regresji prostej liniowej oraz regresji dynamicznej. Z prognozowania prostymi metodami okazał się najlepszy model SNAIVE.